Author Details

Scroll

Refine your search

Collections

Co-Authors

Journals

Year

Authors

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All

Jayanthi, S. K.

Comprehensive Evaluation of Machine Learning Techniques and Novel Features for Web Link Spamdexing Detection

Abstract Views :461 | PDF Views:264

Authors

S. K. Jayanthi ¹, S. Sasikala ², J. P. Vishnupriya ³

Affiliations
1 Department of Computer Science, Vellalar College for Women, Erode, IN
2 Department of Computer Science, KSR College of Arts and Science, Tiruchengode, IN
3 Kumaraguru College of Technology, Coimbatore, IN

Source

ScieXplore: International Journal of Research in Science, Vol 1, No 2 (2014), Pagination: 98-109

Abstract

World Wide Web (WWW) is a huge, dynamic, self-organized, and strongly interlinked source of information. Search engine became a vital IR (Information Retrieval) system to retrieve the required information. Results appearing in the first few pages gain more attraction and importance. Since users believe that they were more relevant because of its top positions. Spamdexing plays a key role in making high rank and top visibility for an undeserved page. This paper focus on two aspects: new features and new classifiers. First, 27 new features which are used to commercially boost the ranking and reputation are considered for classification. Along with them 17 new features were proposed and computed. Totally 44 features were combined with the existing WEBSPAM-UK 2007 dataset which is the baseline. With all these features, feature inclusion study is carried out to elevate the performance. Second aspect considered in this paper is exploring new suite of five different machine learners for the web spam classification problem. Results are discussed. New feature inclusion improves the classification accuracy of the publicly available WEBSPAM-UK 2007 features by 22%. SVM outperforms well than the other methods in terms of accuracy.

Keywords

Decision Table, HMM, Search Engine, SVM, Web Spam.

Full Text

References

Egele M., Kolbitsch C., and Platzer C., “Removing Web Spam Links from Search Engine Results”, Journal of Computational Virology, Springer-Verlag, France, 2009.

Delany S.J., Cunningham P., and Coyle L., “An Assessment of Case-Based Reasoning for Spam Filtering”, Springer Artificial Intelligence Review, p. 359–378, 2005.

Chung Y., Toyoda M., and Kitsuregawa M., “Identifying Spam Link Generators for Monitoring Emerging Web Spam”, WICOW’10, North Carolina, USA, 2010. p. 51–58.

Erdelyi M., Garzo A., and Benczur A., “Web spam classification: a few features worth more”, WICOW/AIRWeb Workshop on Web Quality, India, 2011. p. 27–34.

Karimpour J., Noroozi A., and Abadi A., “The Impact of Feature Selection on Web Spam Detection”, I.J. Intelligent Systems and Applications, p. 61–67, 2012.

Geng G., Wang C.H., and Dan Li Q., “Improving Web Spam Detection with Re-Extracted Features”, WWW 2008, Beijing, China. 2008. ACM, p. 1119–1120.

Benczur A., Bıro I., Csalogany K., and Sarlos T., “Web spam detection via commercial intent analysis”, 3rd International Workshop on Adversarial Information Retrieval on the Web, AIRWeb’07. 2007.

Gan Q., and Suel T., “Improving Web Spam Classifiers Using Link Structure”, AIRWeb ’07, Canada. 2007.

Jayanthi S.K., Sasikala S., “WESPACT: Detection of Web Spamdexing with Decision Trees in GA Perspective”, International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012), Periyar University, Salem, IEEE Xplore, Listed in SCOPUS, 2012 Mar 21–23. p. 381–386.

Jayanthi S.K., and Sasikala S., “REPTree Classifier for Identifying Link Spam in Web Search Engines”, Ictact Journal On Soft Computing, vol. 3(2), p. 498–505, 2007

Jayanthi S.K., Sasikala S., “Web Link Spam Identification Inspired By Artificial Immune System and the Impact of TPP-FCA Feature Selection on Spam Classification”, Ictact Journal On Soft Computing, vol. 4(1), p. 633–644, 2013 Oct.

Jayanthi S.K., Sasikala S., “Naïve Bayesian Classifier and PCA for Web Link Spam Classification”, Georgian Electronic and Scientific Journal, GESJ: Computer Science and Telecommunications, vol. 1(41), 2014 Mar.

Tian Y., Weiss G.M., and Ma Q., “A Semi-Supervised Approach for Web Spam Detection using Combinatorial FeatureFusion”, Graph labeling workshop and web spam challenge in European Conference on Machine Learning and Principles and Practice of Knowledge Discovery, 2010. p. 16–23.

Radicati01. Available: www.radicati.com, Accessed on Nov 2010.

Radicati02. Available: http://www.radicati.com/wp/wpcontent/ uploads/2013/05/Corporate-Web-Security-Market2013-2017-Executive-Summary.pdf, Accessed on Oct 2013.

Symantec, Symantec Intelligence Report, b-intelligence_ report_08-2013.en-us, Accessed on Aug 2013.

WWWsize. Available: http://www.worldwidewebsize.com/, Accessed on Nov 2013.

Wiki02. Available: http://en.wikipedia.org/wiki/Machine_ learning, Accessed on 2013.

Wiki03. Available: http://en.wikipedia.org/wiki/Feature_ selection, Accessed on 2013.

Dmoz open directory

Available: www.google.com

iwebtool, Available: http://www.iwebtool.com/pagerank_ prediction, Accessed on 2012.

WEKA, Available: www.cs.waikato.ac.nz/ml/weka/

Edge Detection Using Multispectral Thresholding

Abstract Views :139 | PDF Views:2

Authors

K. P. Sivagami ¹, S. K. Jayanthi ², S. Aranganayagi ¹, A. Geetha ¹

Affiliations
1 Department of Computer Science, J.K.K. Nataraja College of Arts & Science, IN
2 Department of Computer Science, Vellalar College for Women, IN

Source

ICTACT Journal on Image and Video Processing, Vol 6, No 4 (2016), Pagination: 1267-1272

Abstract

Edge detection is a fundamental tool in image processing and computer vision, particularly in the areas of feature detection and extraction. Among various edge detection methods, Otsu method is one of the best optimal thresholding methods for general real world images with regard to uniformity and shape measures. In this paper, a multispectral thresholding algorithm using Otsu method is proposed to detect the edges in multispectral images. Natural, art and simulated images are considered for testing. Since the edges are well known in the simulated images, they are considered for performance evaluation. The results of proposed method, Edge Detection using MultiSpectral Thresholding (EDMST), are compared against the results of Canny Otsu, Improved Otsu, Median based Otsu and Improved Gray Image Otsu edge detection algorithms based on the human visual system, the number of edges and the number of pixels. The experimental results show that the proposed method achieves better performance and hence applied on Satellite images.

Keywords

Edge Detection, Multispectral Thresholding, Otsu Method, Satellite Images, EDMST.

Full Text

NLSDF for Boosting the Recital of Web Spamdexing Classification

Abstract Views :192 | PDF Views:0

Authors

S. K. Jayanthi ¹, S. Sasikala ²

Affiliations
1 Department of Computer Science, Vellalar College for Women, IN
2 Department of Computer Science, Hindusthan College of Arts and Science, IN

Source

ICTACT Journal on Soft Computing, Vol 7, No 1 (2016), Pagination: 1324-1331

Abstract

Spamdexing is the art of black hat SEO. Features which are more influential for high rank and visibility are manipulated for the SEO task. The motivation behind the work is utilizing the state of the art Website optimization features to enhance the performance of spamdexing detection. Features which play a focal role in current SEO strategies show a significant deviation for spam and non-spam samples. This paper proposes 44 features named as NLSDF (New Link Spamdexing Detection Features). Social media creates an impact in search engine results ranking. Features pertaining to the social media were incorporated with the NLSDF features to boost the recital of the spamdexing classification. The NLSDF features with 44 attributes along with 5 social media features boost the classification performance of the WEBSPAM-UK 2007 dataset. The one tailed paired t-test with 95% confidence, performed on the AUC values of the learning models shows significance of the NLSDF.

Keywords

Web Spam, Search Engine, SVM, Decision Table, HMM.

Full Text

Web Link Spam Identification Inspired by Artificial Immune System and the Impact of TPP-FCA Feature Selection on Spam Classification

Abstract Views :156 | PDF Views:0

Authors

S. K. Jayanthi ¹, S. Sasikala ²

Affiliations
1 Department of Computer Science, Vellalar College for Women, IN
2 Department of Computer Science, K.S.R College of Arts and Science, IN

Source

ICTACT Journal on Soft Computing, Vol 4, No 1 (2013), Pagination: 633-644

Abstract

Search engines are the doorsteps for retrieving required information from the web. Web spam is a bad method for improving the ranking and visibility of the web pages in search engine results. This paper addresses the problem of the link spam classification through the features of the web sites. Link related features retrieved from the website are used to discriminate the spam and non-spam sites. AIS inspired algorithms are applied for the dataset and results are evaluated. Artificial immune systems are machine learning systems inspired by the principles of the natural immunology. It comprises of supervised learning schemes which can be adapted for the wide range of the classification problems.UK- WEBSPAM-2007 Dataset [8] is used for the experiments. WEKA [9] is used to simulate the classifiers. Artificial Immune Recognition algorithm seems to perform well than the other classes. Best classification accuracy attained is 98.89 by AIRS1 Algorithm. This seems to be good when comparing with the other classifiers accuracy available on the existing literature.

Keywords

Web Spam, Search Engine, TPP, FCA, AIRS.

Full Text

Measuring the Performance of Similarity Propagation in an Semantic Search Engine

Abstract Views :169 | PDF Views:0

Authors

S. K. Jayanthi ¹, S. Prema ²

Affiliations
1 Department of Computer Science, Vellalar College for Women, IN
2 Department of Computer Science, K.S.R College of Arts and Science, IN

Source

ICTACT Journal on Soft Computing, Vol 4, No 1 (2013), Pagination: 667-672

Abstract

In the current scenario, web page result personalization is playing a vital role. Nearly 80 % of the users expect the best results in the first page itself without having any persistence to browse longer in URL mode. This research work focuses on two main themes: Semantic web search through online and Domain based search through offline. The first part is to find an effective method which allows grouping similar results together using BookShelf Data Structure and organizing the various clusters. The second one is focused on the academic domain based search through offline. This paper focuses on finding documents which are similar and how Vector space can be used to solve it. So more weightage is given for the principles and working methodology of similarity propagation. Cosine similarity measure is used for finding the relevancy among the documents.

Keywords

Semantic Web, BookShelf Data Structure, Similarity Propagation, Cosine Similarity measure, Vector Space Model.

Full Text

Reptree Classifier for Identifying Link Spam in Web Search Engines

Abstract Views :151 | PDF Views:0

Authors

S. K. Jayanthi ¹, S. Sasikala ²

Affiliations
1 Department of Computer Science, Vellalar College for Women, IN
2 Department of Computer Science, KSR College of Arts and Science, IN

Source

ICTACT Journal on Soft Computing, Vol 3, No 2 (2013), Pagination: 498-505

Abstract

Search Engines are used for retrieving the information from the web. Most of the times, the importance is laid on top 10 results sometimes it may shrink as top 5, because of the time constraint and reliability on the search engines. Users believe that top 10 or 5 of total results are more relevant. Here comes the problem of spamdexing. It is a method to deceive the search result quality. Falsified metrics such as inserting enormous amount of keywords or links in website may take that website to the top 10 or 5 positions. This paper proposes a classifier based on the Reptree (Regression tree representative). As an initial step Link-based features such as neighbors, pagerank, truncated pagerank, trustrank and assortativity related attributes are inferred. Based on this features, tree is constructed. The tree uses the feature inference to differentiate spam sites from legitimate sites. WEBSPAM-UK-2007 dataset is taken as a base. It is preprocessed and converted into five datasets FEATA, FEATB, FEATC, FEATD and FEATE. Only link based features are taken for experiments. This paper focus on link spam alone. Finally a representative tree is created which will more precisely classify the web spam entries. Results are given. Regression tree classification seems to perform well as shown through experiments.

Keywords

Web Link Spam, Classification, Reptree, Decision Tree, Search Engine.

Full Text

Improving Personalized Web Search Using Bookshelf Data Structure

X Graphtics^CLUS:Web Mining Hyperlinks and Content of Terrorism Websites for Homeland Security

Abstract Views :135 | PDF Views:0

Authors

S. K. Jayanthi ¹, S. Sasikala ²

Affiliations
1 Computer Science Department, Vellalar College for Women, Erode, IN
2 Computer Science Department, KSR College of Arts and Science, Tiruchengode, IN

Source

International Journal of Advanced Networking and Applications, Vol 2, No 6 (2011), Pagination: 941-949

Abstract

World Wide Web has become one of the best and fast communication media and information could be distributed within few seconds to the world day by day. The evolution of social networking media increases it further more to transfer information in a rapid speed to common people. Terrorism organizations utilize these facets of the web in very efficient manner for their destructive plans. Understanding web data is a decisive task to assure the better perceptive of a website. This paper focuses on the content and link structure mining of the website which was suspicious through XGraphtics^CLUS. This is done through viewing the web as graph and retrieving the various content of the website. This could help in terms of better understanding the motto and various other web connections in the suspicion. The navigational links offered in the particular website could leave with some informative evidence. This paper puts a step towards the national security and provides the user a good perception.

Keywords

Hyperlinks, Content, Mining, Terrorism Websites.

Username
Password
Remember me

Informatics Publishing Limited

Author Details

Jayanthi, S. K.

Comprehensive Evaluation of Machine Learning Techniques and Novel Features for Web Link Spamdexing Detection

Authors

Source

Abstract

Keywords

Full Text

References

Edge Detection Using Multispectral Thresholding

Authors

Source

Abstract

Keywords

Full Text

NLSDF for Boosting the Recital of Web Spamdexing Classification

Authors

Source

Abstract

Keywords

Full Text

Web Link Spam Identification Inspired by Artificial Immune System and the Impact of TPP-FCA Feature Selection on Spam Classification

Authors

Source

Abstract

Keywords

Full Text

Measuring the Performance of Similarity Propagation in an Semantic Search Engine

Authors

Source

Abstract

Keywords

Full Text

Reptree Classifier for Identifying Link Spam in Web Search Engines

Authors

Source

Abstract

Keywords

Full Text

Improving Personalized Web Search Using Bookshelf Data Structure

Authors

Source

Abstract

Keywords

Full Text

X GraphticsCLUS:Web Mining Hyperlinks and Content of Terrorism Websites for Homeland Security

Authors

Source

Abstract

Keywords

Full Text

X Graphtics^CLUS:Web Mining Hyperlinks and Content of Terrorism Websites for Homeland Security